Skip to content

Conversation

@mapleFU
Copy link
Member

@mapleFU mapleFU commented Oct 23, 2025

Which issue does this PR close?

Rationale for this change

Previously, gc() will produce a single buffer. However, for buffer size greater than 2GiB, it would be buggy, since buffer-offset it's a 4-byte signed integer.

What changes are included in this PR?

Add a GcCopyGroup type, and do gc for it.

Are these changes tested?

Yes

Are there any user-facing changes?

gc would produce more buffers

@github-actions github-actions bot added the arrow Changes to the arrow crate label Oct 23, 2025
@mapleFU mapleFU requested a review from alamb October 26, 2025 13:20
@mapleFU mapleFU marked this pull request as ready for review October 26, 2025 13:20
.map(|i| unsafe { self.copy_view_to_buffer(i, &mut data_buf) })
.collect();
for view in self.views() {
let len = *view as u32;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is so slow, but it's right, I can make it faster(by handling the numbers via grouping or batching) if required

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how you would make this much faster - I think the code needs to find the locations to split in any event

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if the buffer size is greater than i32::MAX, it's possible that a single buffer is much smaller than i32::MAX, so this can find batch-by-batch, rather than just adding small buffer one-by-one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see -- you are saying you could potentially optimize the input by looking at each input buffer or something and gcing it individually or something.

That would be interesting, though it would probably take a lot of care to make it fast.

}

#[test]
fn test_gc_huge_array() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test requires about 5GiB memory, it's huge, I don't know would it affect the testing on some machines

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous code would meet bug only when buffer greater than 4GiB, the current code can be tested when > 2GiB. Personally I think leave 2GiB for test is ok but 4GiB is also ok to me, decide on reviewer's idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran this test on my mac M3 and it takes 1.5 seconds so I think it is ok

running 1 test
test array::byte_view_array::tests::test_gc_huge_array ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 578 filtered out; finished in 1.58s

I also verified that without the code in this PR, the test fails like:

---- array::byte_view_array::tests::test_gc_huge_array stdout ----

thread 'array::byte_view_array::tests::test_gc_huge_array' panicked at arrow-array/src/array/byte_view_array.rs:1444:9:
assertion `left != right` failed: gc with huge buffer should not consolidate data into a single buffer
  left: 1
 right: 1
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the single buffer limitation :-(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry -- to be clear I was confirming that the test covers the code added in this PR . I think the test is good

@mapleFU
Copy link
Member Author

mapleFU commented Oct 26, 2025

@alamb Besides, I meet this bug when I have 4GiB StringViewArray, arrow-rs regard offset as u32, however, in arrow standard, this uses i32. So I limit it to 2GiB

There're other places uses u32::MAX in view handling, should I also fix them in other patch?

@mapleFU mapleFU changed the title View: Fixing gc on huge batch Fix: ViewType gc on huge batch would produce bad output Oct 26, 2025
};
vec![gc_copy_group]
};
assert!(gc_copy_groups.len() <= i32::MAX as usize);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion can be removed, I just ensure it would pass

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be change to assert debug here.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mapleFU -- this is a good find. I left some comments, let me know what you think

cc @zhuqi-lucas perhaps you have some thoughts

}

#[test]
fn test_gc_huge_array() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran this test on my mac M3 and it takes 1.5 seconds so I think it is ok

running 1 test
test array::byte_view_array::tests::test_gc_huge_array ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 578 filtered out; finished in 1.58s

I also verified that without the code in this PR, the test fails like:

---- array::byte_view_array::tests::test_gc_huge_array stdout ----

thread 'array::byte_view_array::tests::test_gc_huge_array' panicked at arrow-array/src/array/byte_view_array.rs:1444:9:
assertion `left != right` failed: gc with huge buffer should not consolidate data into a single buffer
  left: 1
 right: 1
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

total_buffer_bytes: total_large,
total_len: len,
};
vec![gc_copy_group]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it is probably a small thing, but could you please avoid this new allocation for the common case where there is a single buffer? Perhaps by creating the data bufs directly via a function call rather than a loop over the array

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, a fixed-sized slice on stack might be better in this scenerio

.map(|i| unsafe { self.copy_view_to_buffer(i, &mut data_buf) })
.collect();
for view in self.views() {
let len = *view as u32;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how you would make this much faster - I think the code needs to find the locations to split in any event

@alamb
Copy link
Contributor

alamb commented Oct 28, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing gc-fixing-huge-batch-bug (142284c) to a7572eb diff
BENCH_NAME=view_types
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench view_types
BENCH_FILTER=
BENCH_BRANCH_NAME=gc-fixing-huge-batch-bug
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Oct 28, 2025

The MIRI test is probably failing due to the massive memory use in https://github.com/apache/arrow-rs/actions/runs/18818674867/job/53690752815?pr=8694

I suggest we don't run that test under miri by disabling it, with something like

    #[cfg_attr(miri, ignore)] // Takes too long

For example

https://github.com/apache/arrow-rs/blob/ba22a214b3c469da5466f60a74ab201a268bc0fc/arrow-array/src/array/boolean_array.rs#L789-L788

@alamb
Copy link
Contributor

alamb commented Oct 28, 2025

🤖: Benchmark completed

Details

group                                             gc-fixing-huge-batch-bug               main
-----                                             ------------------------               ----
gc view types all without nulls[100000]           1.05  1600.4±85.77µs        ? ?/sec    1.00  1527.4±58.01µs        ? ?/sec
gc view types all without nulls[8000]             1.07     68.0±3.35µs        ? ?/sec    1.00     63.5±1.04µs        ? ?/sec
gc view types all[100000]                         1.16    296.3±9.94µs        ? ?/sec    1.00    255.3±1.11µs        ? ?/sec
gc view types all[8000]                           1.14     21.5±0.32µs        ? ?/sec    1.00     18.9±0.11µs        ? ?/sec
gc view types slice half without nulls[100000]    1.06   552.7±15.42µs        ? ?/sec    1.00    520.1±7.45µs        ? ?/sec
gc view types slice half without nulls[8000]      1.06     28.9±0.20µs        ? ?/sec    1.00     27.2±0.15µs        ? ?/sec
gc view types slice half[100000]                  1.15    145.5±3.36µs        ? ?/sec    1.00    126.3±0.63µs        ? ?/sec
gc view types slice half[8000]                    1.14     11.0±0.12µs        ? ?/sec    1.00      9.6±0.05µs        ? ?/sec
view types slice                                  1.00    709.6±1.75ns        ? ?/sec    1.00    706.8±1.32ns        ? ?/sec

Copy link
Contributor

@zhuqi-lucas zhuqi-lucas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice finding, thank you!

};
vec![gc_copy_group]
};
assert!(gc_copy_groups.len() <= i32::MAX as usize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be change to assert debug here.

}
&groups
} else {
one_group.as_slice()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've add a one_group for gc-copy-group in stack, hopefully this avoids allocation

let one_group = [GcCopyGroup {
            total_buffer_bytes: total_large,
            total_len: len,
        }];

@mapleFU
Copy link
Member Author

mapleFU commented Oct 31, 2025

@alamb I've ran the new code on MacOS M4 Pro. Don't know whether the performance comes from unstable environment or other

This patch:

view types slice        time:   [112.32 ns 112.44 ns 112.56 ns]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) low severe
  1 (1.00%) low mild

gc view types all[100000]
                        time:   [246.89 µs 249.22 µs 251.63 µs]
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) low mild
  4 (4.00%) high mild

gc view types slice half[100000]
                        time:   [79.469 µs 81.771 µs 84.060 µs]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

gc view types all without nulls[100000]
                        time:   [406.50 µs 411.58 µs 417.23 µs]
Found 14 outliers among 100 measurements (14.00%)
  9 (9.00%) high mild
  5 (5.00%) high severe

gc view types slice half without nulls[100000]
                        time:   [373.40 µs 379.37 µs 385.93 µs]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

gc view types all[8000] time:   [11.603 µs 11.663 µs 11.729 µs]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

gc view types slice half[8000]
                        time:   [9.4269 µs 10.204 µs 11.012 µs]
Found 15 outliers among 100 measurements (15.00%)
  13 (13.00%) low severe
  2 (2.00%) low mild

gc view types all without nulls[8000]
                        time:   [27.001 µs 27.155 µs 27.368 µs]
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) high mild
  4 (4.00%) high severe

gc view types slice half without nulls[8000]
                        time:   [13.570 µs 13.594 µs 13.621 µs]
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low severe
  2 (2.00%) high mild
  1 (1.00%) high severe

main:

view types slice        time:   [118.73 ns 121.33 ns 124.80 ns]
                        change: [+16.208% +20.497% +25.223%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

gc view types all[100000]
                        time:   [175.56 µs 177.37 µs 179.60 µs]
                        change: [-26.662% -25.460% -24.243%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

gc view types slice half[100000]
                        time:   [76.242 µs 76.520 µs 76.843 µs]
                        change: [-2.6612% -0.4692% +1.8536%] (p = 0.69 > 0.05)
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) high mild
  7 (7.00%) high severe

gc view types all without nulls[100000]
                        time:   [385.38 µs 393.14 µs 401.00 µs]
                        change: [-3.7465% -2.3347% -0.8310%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) low mild
  4 (4.00%) high mild
  3 (3.00%) high severe

gc view types slice half without nulls[100000]
                        time:   [422.70 µs 426.62 µs 430.63 µs]
                        change: [+2.9492% +5.2722% +7.6808%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe

gc view types all[8000] time:   [16.079 µs 17.457 µs 18.815 µs]
                        change: [+47.350% +57.679% +66.216%] (p = 0.00 < 0.05)
                        Performance has regressed.

gc view types slice half[8000]
                        time:   [14.127 µs 15.144 µs 16.037 µs]
                        change: [+26.202% +34.352% +42.179%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  9 (9.00%) low mild

gc view types all without nulls[8000]
                        time:   [27.309 µs 28.280 µs 29.285 µs]
                        change: [-2.8581% -1.0174% +0.8766%] (p = 0.29 > 0.05)
                        No change in performance detected.
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  11 (11.00%) high severe

gc view types slice half without nulls[8000]
                        time:   [15.009 µs 15.857 µs 16.825 µs]

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mapleFU - I went through this again and it looks good to me.

I would like to run the benchmarks one more time before merging this (I have queued them up on my machine so hopefully they'll post to this PR in a few hours)

}

#[test]
fn test_gc_huge_array() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry -- to be clear I was confirming that the test covers the code added in this PR . I think the test is good

.map(|i| unsafe { self.copy_view_to_buffer(i, &mut data_buf) })
.collect();
for view in self.views() {
let len = *view as u32;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see -- you are saying you could potentially optimize the input by looking at each input buffer or something and gcing it individually or something.

That would be interesting, though it would probably take a lot of care to make it fast.

@alamb
Copy link
Contributor

alamb commented Nov 1, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing gc-fixing-huge-batch-bug (fdefa5f) to 2eabb59 diff
BENCH_NAME=view_types
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench view_types
BENCH_FILTER=
BENCH_BRANCH_NAME=gc-fixing-huge-batch-bug
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Nov 1, 2025

🤖: Benchmark completed

Details

group                                             gc-fixing-huge-batch-bug               main
-----                                             ------------------------               ----
gc view types all without nulls[100000]           1.15  1826.5±78.25µs        ? ?/sec    1.00  1592.5±58.43µs        ? ?/sec
gc view types all without nulls[8000]             1.08     70.4±3.71µs        ? ?/sec    1.00     65.2±2.12µs        ? ?/sec
gc view types all[100000]                         1.12    311.7±8.19µs        ? ?/sec    1.00    277.7±3.63µs        ? ?/sec
gc view types all[8000]                           1.08     23.1±0.20µs        ? ?/sec    1.00     21.4±0.11µs        ? ?/sec
gc view types slice half without nulls[100000]    1.10   589.6±17.27µs        ? ?/sec    1.00   535.4±22.47µs        ? ?/sec
gc view types slice half without nulls[8000]      1.01     28.4±0.23µs        ? ?/sec    1.00     28.2±0.26µs        ? ?/sec
gc view types slice half[100000]                  1.11    155.6±2.98µs        ? ?/sec    1.00    140.0±3.30µs        ? ?/sec
gc view types slice half[8000]                    1.11     11.8±0.13µs        ? ?/sec    1.00     10.6±0.08µs        ? ?/sec
view types slice                                  1.00    707.2±1.74ns        ? ?/sec    1.00    705.9±3.69ns        ? ?/sec

if len > MAX_INLINE_VIEW_LEN {
if current_length + len > i32::MAX as u32 {
// Start a new group
groups.push(GcCopyGroup {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can preallocate groups (with_capacity) or use .collect

for view_idx in current_view_idx..current_view_idx + gc_copy_group.total_len {
let view =
unsafe { self.copy_view_to_buffer(view_idx, group_idx as i32, &mut data_buf) };
views_buf.push(view);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can use Vec::collect

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

views_buf is preallocated once, would Vec::collect allocate more buffer than currently?

views_buf.push(view);
}

data_blocks.push(Buffer::from_vec(data_buf));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This coul as well use collect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Array: ViewType gc() has bug when array sum length exceed i32::MAX

4 participants